Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions
نویسندگان
چکیده
This paper presents an automatic phrase boundary labeling method for speech synthesis database annotation using contextdependent hidden Markov models (CD-HMMs) and n-gram prior distributions. At training stage, CD-HMMs are built to describe the conditional distribution of acoustic features given phonetic label and phrase boundary. In addition, n-gram models are estimated to represent the prior distributions of the phrase boundaries to be predicted. At decoding stage, the CDHMMs and n-gram models are combined to predict the phrase boundaries by Viterbi decoding under maximum a posteriori (MAP) criterion. In our experiments, the proposed method utilizing context-dependent bigram prior distributions improved the F-score of phrase boundary labeling from 72.2% to 79.6% on the Boston University Radio News Corpus (BURNC), and from 69.6% to 81.0% on the Blizzard Challenge 2007 database respectively, comparing with the method using only acoustic models.
منابع مشابه
Hidden Markov models with context-sensitive observations for grapheme-to-phoneme conversion
Hidden Markov models (HMMs) have proven useful in various aspects of speech technology from automatic speech recognition through speech synthesis, speech segmentation and grapheme-to-phoneme conversion to part-of-speech tagging. Traditionally, context is modelled at the hidden states in the form of context-dependent models. This paper constitutes an extension to this approach; the underlying co...
متن کاملAccent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling
This paper proposes an automatic prosodic labeling technique for constructing speech database used for speech synthesis. In the corpus-based Japanese speech synthesis, it is essential to use annotated speech data with prosodic information such as phrase boundaries and accent types. However, manual annotation is generally time-consuming and expensive. To overcome this problem, we propose an esti...
متن کاملPerceptually-Related F0 Parameters for Automatic Classification of Phrase Final Tones
Automatic labeling of prosodic features is an important topic when constructing large speech databases for speech synthesis or analysis purposes. Perceptually-related F0 parameters are proposed with the aim of automatically classifying phrase final tones. Analyses are conducted to verify how consistently subjects are able to categorize phrase final tones, and how perceptual features are related...
متن کاملSimultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state du...
متن کاملAn Intonational Phrase Boundary and Pitch Accent Dependent Speech Recognizer
Does prosody help word recognition? In this paper, we propose a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that improves word recognition. We describe the idea of prosody dependent speech recognition by building a prosody dependent speech recognizer that conditions word and phoneme models on two important prosodic variables: intonational phrase bou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015